Add new option -l for combine_tessdata to list LSTM information #3237

stweil · 2021-01-14T21:24:38Z

No description provided.

Signed-off-by: Stefan Weil <[email protected]>

stweil · 2021-01-14T21:26:21Z

This is a first step to show the same information as Ray had produced in #1404 (comment). Ray's information looked like this:

LSTM training info:Network str:[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx384O1c1], flags=41, iteration=2736900, sample_iteration=2737076, null_char=2, learning_rate=0.001, momentum=0.5, adam_beta=0.999

LSTMRecognizer provides training_flags_, training_iteration_, sample_iteration_, null_char_, learning_rate_, momentum_ and adam_beta_, so the remaining information can easily be added.

2021-01-15: Meanwhile the missing information is implemented, too.

Signed-off-by: Stefan Weil <[email protected]>

Shreeshrii · 2021-01-15T17:22:18Z

Thank you, @stweil. This is very useful.

(base) ubuntu@tesseract-ocr-1:~/tesseract$ combine_tessdata -l ~/tessdata_best/eng.traineddata
LSTM: network=[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1], int_mode=0, recoding=1, iteration=814100, sample_iteration=814136, null_char=110, learning_rate=0.001, momentum=0.5, adam_beta=0.999

(base) ubuntu@tesseract-ocr-1:~/tesseract$ combine_tessdata -l ~/tessdata/eng.traineddata
LSTM: network=[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1], int_mode=1, recoding=1, iteration=814100, sample_iteration=814136, null_char=110, learning_rate=0.001, momentum=0.5, adam_beta=0.999

(base) ubuntu@tesseract-ocr-1:~/tesseract$ combine_tessdata -l ~/tessdata_fast/eng.traineddata
LSTM: network=[1,36,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx192O1c1], int_mode=1, recoding=1, iteration=6352400, sample_iteration=6352704, null_char=110, learning_rate=0.001, momentum=0.5, adam_beta=0.999

Now the int_mode can also be queried for START_MODEL in lstmtraining to give a user friendly error message/

Shreeshrii · 2021-01-15T17:33:59Z

(base) ubuntu@tesseract-ocr-1:~/tesseract$ combine_tessdata -l ~/tessdata_best/script/Devanagari.traineddata
LSTM: network=[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx64Lrx64Lfx512O1c1], int_mode=0, recoding=1, iteration=2475100, sample_iteration=2475272, null_char=2, learning_rate=0.001, momentum=0.5, adam_beta=0.999

(base) ubuntu@tesseract-ocr-1:~/tesseract$ combine_tessdata -l ~/tessdata_best/San20201216.traineddata
LSTM: network=[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192 O1c285], int_mode=0, recoding=1, iteration=1579800, sample_iteration=1584900, null_char=285, learning_rate=0.002, momentum=0.5, adam_beta=0.999

(base) ubuntu@tesseract-ocr-1:~/tesseract$ combine_tessdata -l ~/tessdata_best/Sanskrit20201231.traineddata
LSTM: network=[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx64Lrx64Lfx512O1c1], int_mode=0, recoding=1, iteration=3466500, sample_iteration=3466581, null_char=2, learning_rate=0.001, momentum=0.5, adam_beta=0.999

(base) ubuntu@tesseract-ocr-1:~/tesseract$ combine_tessdata -l /home/ubuntu/tesstrain-San/data/SanLayer/tessdata_best/SanLayer_0.686_354028_1278900.traineddata
LSTM: network=[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx64Lrx64Lfx512O1c1][Lfx192O1c1], int_mode=0, recoding=1, iteration=1278900, sample_iteration=1278900, null_char=2, learning_rate=0.0001, momentum=0.5, adam_beta=0.999

stweil · 2021-01-15T17:40:26Z

Why has San20201216.traineddata a network string with blanks? And SanLayer_0.686_354028_1278900.traineddata also has an unusual network string. How did you get those?

stweil · 2021-01-15T17:41:21Z

Should I merge this pull request, or should I wait for more feedback?

stweil · 2021-01-15T17:42:34Z

Now the int_mode can also be queried for START_MODEL in lstmtraining to give a user friendly error message/

That should be implemented in lstmtraining (which can also query int_mode). It might also be possible to convert fast models back to "best" models on the fly, so I could imagine an lstmtraining which can use a fast start model.

Shreeshrii · 2021-01-15T17:49:05Z

Should I merge this pull request, or should I wait for more feedback?

Please merge. Thank you.

Shreeshrii · 2021-01-15T17:59:30Z

Why has San20201216.traineddata a network string with blanks? And SanLayer_0.686_354028_1278900.traineddata also has an unusual network string. How did you get those?

SanLayer_0.686_354028_1278900.traineddata
LSTM: network=[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx64Lrx64Lfx512O1c1][Lfx192O1c1],

I used --append_index 5 --net_spec '[Lfx192O1c1]' - This is the replace top layer training described by Ray in the tutorials.

Why has San20201216.traineddata a network string with blanks?

While comparing the different layer values, I had added spaces to the network spec for readability and didn't change before starting training.

NET_SPEC := [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192 O1c\#\#\#]

stweil added 3 commits January 14, 2021 22:20

Replace STRING by char* in LSTMRecognizer

65e7a57

Signed-off-by: Stefan Weil <[email protected]>

Replace STRING by std::string for LSTMRecognizer::network_str_

ea10c86

Signed-off-by: Stefan Weil <[email protected]>

Add new option -l for combine_tessdata to list the network string

0bbc43c

Signed-off-by: Stefan Weil <[email protected]>

Add more information shown by combine_tessdata -l

6361246

Signed-off-by: Stefan Weil <[email protected]>

stweil changed the title ~~Add new option -l for combine_tessdata to list the network string~~ Add new option -l for combine_tessdata to list LSTM information Jan 14, 2021

stweil merged commit c7baf8f into tesseract-ocr:master Jan 15, 2021

stweil deleted the network-string branch January 15, 2021 17:49

amitdo mentioned this pull request Apr 8, 2021

Model Specification using combine_tessdata #3042

Closed

amitdo added the enhancement label Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new option -l for combine_tessdata to list LSTM information #3237

Add new option -l for combine_tessdata to list LSTM information #3237

stweil commented Jan 14, 2021

stweil commented Jan 14, 2021 •

edited

Loading

Shreeshrii commented Jan 15, 2021

Shreeshrii commented Jan 15, 2021

stweil commented Jan 15, 2021

stweil commented Jan 15, 2021

stweil commented Jan 15, 2021 •

edited

Loading

Shreeshrii commented Jan 15, 2021

Shreeshrii commented Jan 15, 2021 •

edited

Loading

Add new option -l for combine_tessdata to list LSTM information #3237

Add new option -l for combine_tessdata to list LSTM information #3237

Conversation

stweil commented Jan 14, 2021

stweil commented Jan 14, 2021 • edited Loading

Shreeshrii commented Jan 15, 2021

Shreeshrii commented Jan 15, 2021

stweil commented Jan 15, 2021

stweil commented Jan 15, 2021

stweil commented Jan 15, 2021 • edited Loading

Shreeshrii commented Jan 15, 2021

Shreeshrii commented Jan 15, 2021 • edited Loading

stweil commented Jan 14, 2021 •

edited

Loading

stweil commented Jan 15, 2021 •

edited

Loading

Shreeshrii commented Jan 15, 2021 •

edited

Loading